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ABSTRACT 

Differences in science achievement between males and 
females have been examined either directly or indirectly in a variety 
of studies. This investigation reviewed a quantitative synthesis of 
correlational research on science affect, ability, and achievement 
conducted by Steinkamp and Maehr. Their findings were reassessed by 
employing a meta-analysis approach which used tests for fitting 
categorical models to effect sizes. The reexamination focused on 
explanations of the reported differences in science achievement 
between males and females as well as on the role of measurement 
variables in the size of the gender differences. Results indicated 
that though gender differences tended to favor males, even 
significant differences were slight, and gender diffexences for many 
subsets of studies were not significant. The size of the gender 
difference depended in part on the science subject matter being 
tested and also on the type of measure used in the studies. A 
reference list and a list of synthesized studies are appended. 
(ML) 
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The Measurement of Science Achievement 
and Its Role fn Gender Differences 



Betsy Jane Becker and Lin Chang 
Michigan State University 



Abstract 

It is often thought that women have not achieved in the area of science 
to the same degree as have men. Various studies have either directly or 
indirectly examined differences in science achievement between males and 
females, and have found contradictory results. This quantitative review or 
meta-analysis of studies of gender differences in science achievement 
addresses the question of whether males have higher achievement levels in 
science than females, and explores the role of measurement variables in the 
size of the gender differences. 

We review a set of studies of sex differences in science achievement 
gathered from two earlier reviews by Steinkamp and Maehr. Steinkamp and 
Maehr conducted a quantitative synthesis of correlational research on 
science affect, ability, and achievement in which they reported conclusions 
based on average correlations. Later, they reviewed another set of studies 
(using effect sizes) and again reported only the average effect size across 
a\\ studies. 

The purpose of our study is to reexamine part of Steinkamp and Maehr 's 
work by focusing on explanations for the differences in science achievement 
between males and females, using Hedges 's approach to meta-analysis. Our 
review also improves upon the earlier reviews by applying Glass's effect 
sizes together with Hedges' s tests for model fit to the studios of science- 
achievement gender differences collected by Steinkamp and Maehr. 

Results indicated that though gender differences tended to favor males, 
even significant differences were slight, and gender differences for many 
subsets of studies were not significant. The size of the gender difference 
depended in part on the science subject matter being tested and a 'so on the 
type of measure used in the studies. 

Though the subject-matter content of the science-achievement measure 
was significantly related to the size of science gender differences, 
unexplained variation in the gender differences still remained after all of 
our measurement-related explanatory factors had been explored. 
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The Measurement of Science Achievement 
and its Role in Gender Differences 

In the past decade issues of gender equity in education and the 
workplace have received increasing concern (e.g., Brickley, Garfunkel & 
Hulsizer, 1979). A long-standing finding which has received some attention 
is the dearth of women in scientific careers (e.g., National Science 
Foundation, 1977). This scarcity has been linked by some to earlier gender 
differences in science achievement (e.g., DeBoer, 1984). 

Two recent reviews have summarized many studies which allow the 
examination of gender differences in science achievement (Steinkamp & Maehr, 
1983, 1984). These reviews report results from several hundred samples 
which seem to indicate a general superiority of males on measures of science 
achievementt though considerable variation in the size and even the 
direction of the gender difference can be discerned across samples. 

In this paper we attempt to understand more about gender differences in 
science achievement by reanalyzing the studies reviewed by Steinkamp and 
Maehr. 8y examining variation in the gender differences and the 
relationship of achievement differences to explanatory variables we hope to 
gain a better understanding of the interrelationship of gender and science 
achievement. We consider as possible explanatory variables features of the 
studies which relate to the measurement of science achievement. Our 
analyses should enable us to hypothesize about possible causes of any 
differences that are found, or alternatively to eliminate some potential 
causes from consideration. 

Our paper begins with a rationale for the synthesis and brief 
discussions of measurement issues and of gender differences in science 
achievement. The methods for this review are next discussed, and are 
compared briefly to those used by Steinkamp and Maehr. Our findings and a 
discussion conclude the paper. 
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Rationale for the Review 

Steinkamp and Maehr (1983) reported a quantitative synthesis of 
correlations among affect, ability, and achievement in science; and between 
each of these variables and gender. Their primary question in examining the 
literature on cognitive and attitudinal origins of science achievement was 
to determine whether science instruction should focus especially on 
affective outcomes. 

Steinkamp and Maehr summarized correlation coefficients between gender 
and science achievement from 15 studies, using t tests to examine the 
significance of the average correlations. One of Steinkamp and Maehr's 
conclusions was that boys achieve slightly better than girls in science, and 
they elaborated on that finding by calculating average correlations for 
samples grouped by the subject matter of the science test used and by the 
school grade level of the subjects in the sample. Steinkamp and Maehr did 
not, however, statistically examine the variation in their results. 

Steinkamp and Maehrs' second review (1984) was primarily concerned with 
gender differences in motivational orientations toward science achievement, 
though they summarized results on gender differences in achievement as well. 
Using Glass's (1976) effect size to represent the extent of gender 
differences, the authors found that across 406 samples, boys' achievement 
averaged slightly better than that of girls. Though they reported averages 
and standard deviations for sex differences based on studies from different 
sources (i.e., articles versus standard! zed-test manuals), Steinkamp and 
Maehr did not focus on achievement gender differences in this review. 

The variability in the results from both of Steinkamp and Maehr's 
reviews raises questions about their conclusions. One might ask whether the 
average correlations or effect sizes they- report are representative of all 
their study results. That is, one could investigate whether the studies 
share one population correlation or one population effect size. If not. 



Measurement and Gender 4 



average correlations (or mean differences) could be misleading in that they 
would not accurately describe results in all (or perhaps any) of the 
studies. One could also examine the similarities or differences between the 
correlational results and the results from noncorrelational studies* More 
important, however, is the question of variation in the size of the gender 
differences. If we can find other variables which relate to the sizes of 
the sex differences, we may begin to understand reasons for and causes of 
any differences that do exist. These issues will guide our analyses in this 
paper . 

Measurement and Gender Differences 

We focus in this review on issues in the measurement of science 
achievement. Because the construct of science achievement is most often 
represented by the scores obtained by students on science achievement tests 
or the grades that they obtain in science classes, the instruments have much 
to do with the outcomes of quantitative reviews. 

Several obvious features of the measures of science achievement seemed 
likely to relate to gender differences in science achievement. Though little 
work has been done in the area of science achievement, research in other 
content doniains has shown that test length, test content (specifically 
within-subject differences in content such as geometry versus algebra in 
mathematics) , and test format may relate to gender differences in 
performance (Dwyer, 1979). 

Test length and the speed?dness of a test have not been shown to 
consistently relate to gender differences, though Dwyer (1979) noted that 
speeded tests in masculine contexts have been found to favor males. 

Some evidence (Finn, Dulberg, & Reis, 1979) suggests that gender 
differences cross-nationally relate to both test content and student age. 
Evidence from the study of the International Association for the Evaluation 
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of Educational Achievement (lEA) suggested that gender differences were 
smaller for younger students and increased to almost a full standard 
deviation advantage for males by the end of secondary school • Boys were 
noted to excel 1 the most in physical sciences, and less in biology. In some 
countries girls outperformed boys in biology. 

Also, riaws or biases in the construction of the measures mjy relate 
to the size of the gender differences on the measures, though such 
relationships may be complex or difficult to discover without detailed data 
about the author(s) of the instrument and details about the method of 
construction. 

In this analysis we examine test content, test length, and student 
grade, as well as some other features of the measurement and analysis of 
science achievement. 

Meta-analysis of Gender Differences 

Methods of integrative reviewing have developed much since Glass (1976) 
introduced the effect size and the idea of meta-analysis. Glass suggested 
the effect size, or standardized difference between the means of a treatment 
and a control group, as a scale-free measure of treatment effect. (In this 
review the effect size serves as a measure of gender differences.) Glass 
felt that this quantitative index could be used to combine study results in 
a quantitative and more objective fashion, following the procedures and 
rigor required of primary data analysis. 

Hedges (1981) pointed out that Glass's meta-analysis is designed to 
draw inferences about a population effect size, which he called 6, through 
analyses of sample estimates. Early meta-analyses (including those by 
Steinkamp and Maehr) used intuitively sensible and familiar statistical 
analyses such as t tests and analysis of variance to analyze sample effect 
sizes* However, those methods are usually not appropriate for the analysis 
of effect-size data because they require the assumption of homogeneous 
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variances across units (In this case, studies) • Since effect sizes are 
based on mean differences, their variances, like the standard error of the 
sample mean, will depend on the sizes of the samples for which they are 
calculated. Thus, variances of effect sizes are usually heterogeneous. 

Hedges (i982a) further indicated that when the effect sizes themselves 
are not similar a pooled or average estimate can be misleading. The 
original idea of Glass's meta-analysis was improved by Hedges' s (1981) 
methods for distinguishing among effect sizes for studies which do not share 
a common (population) effect size. In our analyses we estimate several 
simple categorical (ANOVA-like) and regression-l ike models and examine 
whether any of them adequately explains variability in the sizes of the 
gender differences. 

Methods 

The Co 1 1 ect i on of Studies 

Published studies examining sex differences in science achievement from 
both of Steinkamp and Maehr's earlier reviews (1983, 1984) are included in 
this review. A total of 120 distinct sources were identified from the 
bibliographies provided by Steinkamp and Maehr (1983, 1985). Though 
Steinkamp and Maehr also obtained effect-size estimates from test manuals 
and large scale studies they did not provide references for those sources. 
Thus the review was confined to the published sources of data listed in the 
bibliographies. Six dissertations and three unpublished manuscripts (f)ot 
available through ERIC) were excluded from the review. 

One hundred and eight published articles and ERIC documents were 
retrieved, and 42 of those sources were identified as having examined sex 
differences in science achievement. Each article was then read, effect-size 
measures were extracted from as many distinct independent samples as 
possible, and relevant study features were coded. (The original data from 
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the Stefnkamp and Maehr syntheses was requested, but was not available.) 

One shortcoming of the data set examined here is that it does not 
contain many samples representing results from England. In particular, the 
work of Kelly (e.g., Kelly, 1978) and others on the Girls into Science and 
Technology project is an important piece of the research on science and 
gender. We plan to incorporate the data into our analyses as soon as it can 
be obtained. 
Study Coding 

Study features . Numerous study characteristics were coded for the 48 
samples in the final collection. Table 1 presents a list of the study 
features used in our analyses. 
Table 1 

Feacures of Studies 

Study feature Categories 

Subject-matter General science 

content Biology 



Number of items on measure 
School grade of subjects 

Of primary interest were the features of studies which related to 
measurement issues and to study design. Several features of the instruments 
used to measure science achievement were coded, including the subject-matter 
(content) testedt the reliability of the instrument, the test length, and 



Chemistry 

Geology/earth sciences 
Phys i cs/phys i ca 1 sc i ence 



Testing design 



Analysis of covariance 
Posttest only 
Pretest/posttest (change) 



Type of achievement 
measure 



Constructed for the study 
Locally standardized 
Standardized 
Course grade 
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the type of test which was used (e.g., standardized tests versus course 
grades) . 

The subject-matter content of the measures was determined through 
descriptions given in the studies and inspection of the instruments. Many 
articles described the tests in detail or presented all of the test items 
(or at least sample items). Even when detailed information was not 
available, titles often described the general content of the measures, and 
more specific Information was sometimes avai lable ?n test manuals or 
handbooks (for standardized tests especially). When a test contained a 
mixture of several specific science content areas (e.g., Bowyer & Linn, 
1978) it was categorized as a test of general science. Several measures 
specifically labeled as tests of general -science ability (e.g., Field & 
Crop ley, 1969) were also categorized into this grouping. 

Additional features coded included the date of data collection and 
publication for each study, the nationality of the subjects, the source of 
the study (e.g., journal versus book), and several others. 

Subjects of the studies were typically primary or secondary school 
students; only six samples of college students were included in the review. 
School grade level of the students was noted, and all college samples were 
assigned a grade of "13," since most often their actual college class level 
was not indicated. 

Effect sizes . Sample sizes were obtained or estimated from all 
studies, and 30 effect sizes were extracted from samples which provided 
sufficient data for their calculation. We chose to represent the sex 
difference In the effect-size metric rather than as a correlation because 
effect sizes are more easi ly Interpreted than correlation coefficients. 
That Is, effect sizes directly represent the difference in means between the 
sexes. Furthermore, the distribution of the effect size is approximately 
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normal even when its population mean is nonzero, whereas the same is not 
true for the correlation. 

Glass's (1976) effect size was calculated for each study. The mean 
difference between males and females on the science outcome measure (denoted 
here as Y) is used instead of the mean difference between treatment and 
control groups. A positive effect size represents a male advantage on the 
science achievement measure. The formula for Glass's effect size for the 
1th of a set of k studies is: 

- 

9; = . (1) 

S. 
1 

Where S. is the pooled standard deviation from the usual two-sample t test 
for male and female groups. When only t or F statistics were presented in 
the studies, g was calculated as An^ + np)/n|,np) t, which is algebraically 
equivalent to ( 1) . 

In many cases studies did not present the raw means and standard 
deviations needed to compute g as it is shown in (I). In such cases 
algebraic transfomations were used (e.g., Glass, McGaw, & Smith, 1981, pp. 
93-152) to obtain g from available data. When analysis of variance summary 
statistics were presented, sums of squares for between-subject s terms (other 
than gender) were repooled into the error sum of squares to give an error 
estimate comparable to S.. 

When point biserial correlations of subject gender with the science 
outcome were reported, (e.g., Doran & Sellers, 1978), approximate g values 
were obtained via 



g' 



^"i - ^i 
S' 



(n^ + Qp) 



GY' 



(2) 
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where r^y is the point-bi serial correlation between gender and the science 
outcome and and np are the sample sizes for males and females. This 
effect size may differ from g in that its denominator S' tends to br^ larger 
than Sp when gender differences are large. In these studies, however, it 
appears that the difference between S' and Sp \s negligible. The mean ratio 
of S' to Sp in the ten samr^Ies in which both standard deviations could be 
computed was 1.012, which has little influence on the value of g'. Thus g' 
values were considered equivalent to the g values computed as in (1). 

Hedges (1981) found that Glass's estimator g has a small sample bias, 
and he obtained a corrected effect size d., wnich is the minimum variance 
unbiased estimator of 6. The unbiased estimator, corrected for small-sample 
bias and unreliability, is 

d. = (1 - 3/(4mj - D) gj/v'rj" 

where nij 's n^^^ + n^ - 2 and rj is the reliability estimate for the science 
achievement measure (Y) used in the ith study. When test reliability was 
not reported we used as r. the average reliability for the rest of the 
studies, .82. We use the corrected effect sizes (d's) in our analyses. 
Model F i tt i ng and Estimation 

The analyses reported in our paper are based on Hedges's ( 1982a, b) 
tests for fitting categorical models to effect sizes. Our procedure was to 
first ask whether the size of the gender difference on science achievement 
(across all the studies) was consistent. If the results were inconsistent, 
then studies were categorized according to one (c^ more) of the study 
features listed in Table 1. The agreement among results within each subset 
of results was then examined, as were possible differences between the 
groups. 

When effect sizes within the groups appeared fairly similar, the 
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analyses were stopped; if not, further subdivision of the groups continued 
until a sensible nx)del was found or until the selected predictor variables 
were exhausted* 

Our analyses differ from those of Steinkamp and Maehr because we not 
only examine between-study differences in the magnitudes of gender effects, 
but we follow with an examination of whether nonrandom variation remains 
within the sets of effects being considered. This allows us to address the 
question of whether an "adequate" explanation or model for the results has 
been found. 

Results 

Effect Sizes 

We first tested the homogeneity of the results of the gender 
differences. Note that we omit one of the two effect sizes from 
Marjori banks (1976) in order to have only independent data in the analysis. 
Table 2 shows the analysis of the set of 29 effects. 

The homogeneity test value was 101.39, which as a chi -square 
variable with k - 1 = 28 degrees of freedom, is quite large (q < .001). All 
the effect sizes can not be represented with one population parameter. This 
does not seem surprising since the biased uncorrected effect sizes ranged 
from -0.36 to 0.43. 

The average effect size for al 1 studies is estimated to be 0.16 
standard deviations, which differs from zero (g < .05). This value is lower 
than that reported by Steinkamp and Maehr (1983). Their correlational 
studies produced an average correlation of 0.16, which corresponds 
approximately to an average effect size of 0.32. Though this indicates that 
males are on average achieving higher science scores than females, the value 
is an average and not a common effect size value. Some studies show more of 
an advantage for males and others show less; some may also show female 
superiority. 

13 
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We next grouped the effects according to the subject-matter content of 
the achievement test. Table 2 also shows the homogeneity statistics 
obtained for this first categorical analysis (Hedges, 1982b) • An overall 
test of the withln-groups homogeneity, H^^, is the sum of the homogeneity 
values for each subgroup. Its value, 50.34, is significant at the .001 
level (df=:24). Thus there is still considerable variation In the sizes of 
the gender differences within the subject-matter subgroups. However, "^able 
2 shows that the results within the five subject-matter categories are, for 
the most part, consistent. Only the gender differences based on tests 
labeled as general science are not homogeneous. Thus most of the variation 

in the overall H^ statistic results from the differences among the studies 
of general science. 
Table 2 

Subject-Matter Differences Between Effect Sizes 



Test of Mean effect-size 



Source 


df 


Homogeneity 


p value 


estimate (s.e.) 


Total 


28 


101.39 


<.001 


0.16 (0.02) • 


Between subject- 










matter groups 


4 


51.05 


< . 00 1 




Within groups 


24 


50.34 


<.001 




General science 


10 


30.68 


<.005 


0.07 (0.05) 


Biology 


5 


4.54 


ns 


0.14 (0.04) • 


Chemistry 


0 


0.00 




-0.12 (0.06) 


Geo I ogy 


4 


6.79 


ns 


0.10 (0.06) 


Phys i cs 


5 


8.33 


ns 


0.35 (0.03) • 



The test for differences between mean effect sizes for the subject- 
matter groups is given by Hg, which is also a chi -square variable, with 4 
degrees of freedom (one less than the number of categories considered). We 
conclude that the five sets of gender differences have different population 
effect sizes, since Hg = 51.05 is significant. 
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Mean gender differences for two of the subject-matter groups were 
significantly greater than zero. The effects for studies of biology and 
physics both showed advantages for males, of 0.14 and 0.35 standard 
deviations, respectively. There are no significant differences between 
males and females on either geology or chemistry, though the single study of 
chemistry shows a female advantage which is significant with a lenient 
significance level of .10. 

We should note here that the effect size value for physics achievement 
from the omitted Marjoribanks (1976) study (with a value of 0.00) does not 
conform with the results presented in Table 2. In fact, when the 
Marjoribanks sample is Included in the analysis of physics outcomes, the 
wi thin-group homogeneity test for physics increaser .j 28*00, which is 
highly significant as chi-sqare variable with 6 degrees of freedom. Also 
the estimated average effect size is reduced to 0.29 when this study is 
added* Nonetheless the effect size is still larger than those form the 
other subject-matter areas. 

We next subdivide the general -science studies according to the school 
grade of the subjects. Subgroups were elementary schoolers (grades 1 
through 6), junior-high students (grades 7 through 9), senior-high students 
(grades 10 through 12), and college students. (No linear relationship of 
grade to the gender difference was found, thus a categorization of grade was 
used to mesh with the subject-matter categorical analysis.) 

The homogeneity statistics for studies of general science divided by 
grade are shown in Table 3. Only studies of junior-high groups share a 
common population effect size, which was significantly different from zero 
and indicates more than a quarter of a standard deviation advantage for 
males. The effect sizes on general science for elementary or senior-high 
subjects are still inconsistent, and are on average smaller than those for 
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the junior-high subjects. 
Table 3 

Ana I vs i s of Gender Differences in General Science by Grade of Subjects 



Test of Mean effect-size 



Source 


df 


Homogeneity 


P value 


estimate (s.e.) 


General science 


10 


30.68 


<.005 


0.07 (0.05) 


Elementary 


5 


13.70 


<.02 


0.02 (0.05) 


Junior high 


1 


0.67 


ns 


e.29 (0.11) • 


Senior high 


2 


9.54 


<.01 


0.17 (0.11) 


Between Grade 


2 


6.77 


<.05 





Since grouping the studies by grade for general science does not fully 
explain the variations in gender differences t we explored the use of another 
study feature as a grouping variable: the type of measure used. (The 
studies of general science had used standardized and locally-made tests, and 
tests made specifically for the study). 

We find that the results of studies using standardized or locally made 
measures are consistent. However, studies using measures constructed 
specifically for the research have quite inconsistent results. Homogeneity 
statistics for the genera I -science studies grouped by the type of measure 
used are shown in Table 4. 
Table 4 

Analysis of Gender Differences in Genera! Science by Type of Measure 



Test of Mean effect-size 

Source df Homogeneity p value estimate (s.e.) 



General science 


10 


30.. 68 


<.005 


0.07 


(0.05) 


Standardized 


2 


4.21 


ns 


-0.07 


(0.12) 


Local 


0 


0.00 


ns 


0.35 


(0.14) • 


Made for study 


6 


20.73 


<.005 


0.08 


(0.06) 


Between Measures 


2 


5.74 


<.05 
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Again we find that though results for some of the subgroups in the 
analysis are homogeneouSf they still seem to vary considerably within one 
group. Here, results are still quite varied within the category of general 
science tests which were constructed specifically for the study in question. 

Mean iffects differed significantly between the three measure-type 
groups, with a small (though non-significant) superiority for girls found in 
studies using standardized tests. The one study using a locally 
standardized test showed over a third of a standard deviation advantage for 
males, though the general izabi 1 ity of this finding is questionable since it 
is based on only one study. 

Again, considerable variability in the results remains for one category 
of studies, studies using measures constructed specifically for the research 
project reported. The tests in this category included some with content 
from several domains within science, and varied in format as well. For 
example, the Scientific Literacy Test used by Bowyer and Linn (1978) was a 
penc i 1 -and-paper test of content and process topics based on the goals of 
the Science Curriculum Improvement Study. On the other hand, the TAB 
Inventory of Science Processes used by Thomas and Snider (1969) was based on 
a sample of student behavior as he or she solved a science problem. Because 
the category "general science" was used, in a sense, as a catch-all category 
in the coding of the subject-matter content of the measures, the remaining 
amount of variation in the results is somewhat expectable. 

We did no further subdivision of the studies because the number of 
remaining studies is quite small after the studies of general science are 
classed by grade level or by type of measures. 

In order to examine the question of competing hypotheses as 
explanations for the gender differences in science achievement we considered 
several other categorical and regress ion- 1 ike models for the gender 
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differences. Two other simple models involved classifying the studies 
according to either the grade level of the students or the type of measure 
used in the study. There were significant between-grade and between-measure 
differences, but there was also considerable variation within subgroups. 
For the measure-type analysis, though, all of the excess variation was 
concentrated in the largest subgroup of studies, those using author-made 
measures. 

The predictors of publication date, study date, and test length did not 
appear related to the size of the gender difference, either in a linear or 
nonlinear fashion. Similarly, classifying the studies according to either 
the nationality of the subjects or the design of the study (e.g., posttest 
versus change-score analysis) did not produce a well-fitting explanatory 
model. Since the subject-matter predictor is the most salient predictor 
theoretical lyt it is reasonable that it provides the best-fitting model for 
the results. 

Discussion 

Gender differences for all subject-matter groupings except for studies 
of general science are consistent, and the average gender differences are 
all less than one half of a standard deviation. In physics and biology boys 
tend to do significantly better than girls, and the sizes of the gender 
differences are about one-third of a standard deviation for physics and 
about one sixth of a standard deviation for biology. 

These results resemble those found by Finn, Dulberg, and Reis (1979), 
whose cross-cultural examination of the science achievement of adolescents 
showed smaller sex differences for biology than for other science areas. 
However, our gender differences do not appear to incresae with aoe and are 
much smaller in absolute magnitude than those reported by Finn, Dulberg and 
Reis. Some have suggested that the larger gender differences in physics 
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result from gfrls' lack of mathematical skills. Unfortunately, our analyses 
are unable to shed any 1 ight on that matter. 

Much of our analysis involved the examination of the studies of general 
science, and consideration of competing models to explain the gender 
differences across all studies. It is understandable that the results for 
studies of general science are inconsistent- Most of the general science 
measures seemed to vary internally in content and format. In part we had 
coded these measures as "general science" because they did not have specific 
subject-matter content. Differences due to variation in quality and content 
of the author-made measures also may have contributed to the heterogeneity 
of the general -science group. 

It was interesting to find that other potential explanatory factors did 
not account for the variation in effect sizes as well as the subject-matter 
factor. Significant differences were found between some other groupings of 
studies (e.g.t grade-level and measure-type groupings), though these 
categorizations left much variation unexplained. 

Similarly, nationality of the studentst study design, number of test 
items, and date of publication did not relate to the gender differences. 
Thus in our data the nationality of the students does not appear to be an 
important factor in explaining gender differences, and for the samples 
investigated here the length of the test did not relate to the size of the 
gender difference. In most cases, however, it was not possible to determine 
whether the test was speeded, so the question of whether gender differences 
are due to test speed (independent of test length) can not be addressed. 

All of these findings suggest a few conclusions. One is that the 
degree of gender differences in achievement varies significantly across 
subject-matter areas in science. This suggests that care should be taken to 
distinguish between content areas when discussing or researching science 
Q achievement and gender. 
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Secondly, much of the variation in results appeared for tests 
constructed by study authors for the express purpose of their research • 
There are many possible sources of bias inherent in tests that have not been 
rigorously scrutinized as (presumably) is true of most standardized tests » 
Some author-made tests in this review do not appear to have been pilot- 
tested before their use in the published research. Though our present data 
did not permit such an analysis, a closer examination of the measures of 
science achievement, especially of general science achievement may be 
warranted. 
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TdDle I 



Results of tne Stuaies of Science Acnievement 



Study 


SuDject 
matter 


Type of 
measure 


Rel ia- 
bi 1 ity 


M 


r 


Scnoo 1 
grade 


g 


Number 
of items 


Al ten (1970) 


General 


Stazd 


0 


150 


150 


1 


6 


999 


Al len ( 1972) 


General 


Autnor 


0 


105 


106 


2 


6 


14 


Al len ( 1973) 


General 


Autnor 


0 


100 


76 


3 


0.42 


25 


Al ten ( 1975) 


General 


Autnor 


0 


162 


162 


5 


8 


999 


Anaerson (1980) 


Pnysicb 


Autnor 


0 


82 


53 


6 


8 


36 


Asnbaugn (1968) 


Geo 1 ogy 


Autnor 


.84 


52 


54 


4 


0.43 


40 


Asnoaugn (1968) 


Geo i ogy 


Autnor 


.84 


46 


47 


5 


0.00 


40 


AshDaugn (1966) 


Geo 1 ogy 


Autnor 


.84 


47 


47 


6 


0.26 


40 


BaDiKian (1971) 


Pnysics 


Autnof 


.76 


108 


108 


8 


0.36 


38 


Bowyer & Linn (1978) 


General 


Autnor 


.91 


284 


247 


6 ■ 


-0.15 


28 


Bridgnam (1969) 


Pnysics 


Autnor 


66 


29 


21 


3 


0.42 


30 


Brown, Michaeis & 


















6 leasee (1965) 


Biology 


Autnor 


.90 


1 12 


111 


13 


7 


60 


Carnes, Biedsoe & 


















VanDeventer (1967) 


Genera 1 


Autnor 


.84 


110 


111 


7 


9 


50 


Clarke (1972) 


General 


Stdzd 


0 


415 


361 


5 


7 


999 


Doran & Ngol (1979) 


Pnysics 


Autnor 


.64 


101 


101 


6 


9 


999 


Ooran & Se! lers (1978) 


Biology 


Unsure 


.90 


160 


160 


4 


0.22 


999 


Field & Cropley (1969) 


General 


Local 


0 


104 


74 


1 1 


0.38 


999 


finger, Dillon & Corbin (1965) 


Pnysics 


Grade 


0 


137 


42 


13 


8 


999 


fuiier. May & Butts (1979) 


Biology 


Autnor 


.97 


66 


67 


3 


0.37 


40 


Mart (1978) 


Biology 


Autnor 


.67 


150 


150 


12 


0.12 


35 
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TaDie 1 I continued) 



Results of tne Stuoies of Sc i ence Acnievernent 



Study 


SuDject 


Type of 


Rel ia- 






Scnool 




Number 




matter 


measure 


Di 1 ity 


PI 


r 


grade 


g 


of items 


Keeves (1975) 


General 


Local 


0 


107 


108 


7 


0.32 


999 


Kempa & Ward (1975) 


Cnemistry 


Autnor 


.69 


127 


13 


9 


9 


20 


KruglaK (1970) 


Pnysics 


Stdza 


0 


650 


230 


13 


8 


999 


Lyncn, Benjamin, Cnapman, 


















Hoimes, ricCanimon, 


















SmJtn, & Symmons (1979) 


Pnysics 


Autnor 


.71 


969 


666 


6 


0.22 


16 


Lyncn & Patterson (19B0) 


Pnysics 


Autnor 


0 


969 


666 


8 


6 


16 


MareK (1981) 


Biology 


Autnor 


.73 


37 


55 


10 


0.18 


102 


narjori banks (1976) 


Pnysics 


Autnor 


.94 


201 


195 


6 ■ 


-0.12 


999 


Marjori banks (1976) 


Biology 


Autnor 


.93 


201 


195 


6 


0.00 


999 


narjori banks (1978) 


Biology 


Autnor 


.93 


219 


210 


6 


6 


999 


ricOuffie & Beenler (1978) 


General 


Loca t 


.82 


196 


197 




-8 


40 


Ogaen & Brewster (1977) 


General 


Stdzd 


.88 


63 


20 


11 


0.28 


60 


Ogden & Brewster (1977) 


Genera 1 


Stdzd 


.88 


41 


50 


1 1 


-0.36 


60 


Raven & Adrian (1978) 


Genera 1 


Stdzd 


.85 


132 


117 


10 


7 


999 


Scott & Siegel (1965) 


Genera 1 


Autnor 


.78 


bO 


50 


4 


-0.05 


20 


Scott & SiegeJ (1965) 


Genera I 


Author 


.78 


50 


50 


5 


-0.18 


20 


Scott & Siegel (1965) 


Genera 1 


Author 


.78 


50 


50 


6 


0.12 


20 


She! 1 (19/0) 


General 


Stdzd 


0 


16 


52 


13 


7 


999 


Shrfgley (1972) 


Earth Science 


Author 


.88 


64 


56 


6 


0.24 


65 


Slevekfng & Savitsky (1969) 


Chemistry 


Stdzd 


.96 


498 


498 


13 - 


-0.12 


999 


Skinner (1967) 


Geology 


Author 


0 


458 


430 


5 


0.02 


46 


Strope & Braswen (1966) 


Genera 1 


Author 


0 


104 


103 


13 


8 


999 
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faoJe 1 (continued) 

Results of tne Stua i es of Sc i ence Acnievement 



Study SuDject Type of Reiia- Scnool Numoer 

matter measure DiMty n n. graae g of items 



Tamir ( iy/4) 


Biology 


Local 


0 


259 


256 


12 


8 


999 


Tamir ( iy74) 


Biology 


Local 


0 


259 


256 


12 


-8 


999 


1amir (1976) 


Biology 


Autnor 


.79 


468 


521 


12 


0. 12 


30 


Tamir & Amir ( iy75) 


Pnysics 


Autnor 


.74 


65 


51 


1 


0.36 


999 


TamJr & Amir ( 1976) 


Hnys i cs 


Autnor 


.72 


65 


51 


2 


0. 16 


999 


■f nomas & Snider ( i969) 


Genera 1 


Autnor 


.82 


45 


41 


8 


0.13 


60 


Waioerg (1969) 


Pnys i cs 


Local 


.77 


675 


375 


12 


0.42 


999 


Wai iacn & Kogan (1966) 


Genera I 


Staza 


.91 


70 


81 


5 


-0.03 


999 


Keisberg (1970) 


biology 


Autnor 


.84 


48 


48 


8 


7 


40 



Ine coaes for tne variaoies are as follows: 

ReiiaDiiity: 0 = not reported, mean reliability of 0.82 was substituted. 
Effect size (g): 6 = aepenaent; same subjects appear In another study. 

7 = data for g is given, but direction is not reported. 

8 = Direction for g Is given, but data is not reported (sign of 8 

Indicates direction of tne difference). 
^ = neitner oata nor direction of the difference Is reportea. 
otnerwise, tne value of g Is the computed effect size. 
Numoer of items: 999 = missing aata. 
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